Characteristics of the movies in the last 30 years

This time, the data is huge. we will analyze the relationship of the score and each element, such as vote, gross, and budget. we will understand these elements are contributing to the score or not. It is will important for us, for example, Does the high budget can get a high score? It may affect our analysis.

We also will help Mr. Spielberg to review his recording of movies. We may can get some idea from the data in the past. We are very excited to work with Mr. Spielberg. He is the best director in the world. So, we will do our best to find what the data we need and give a valuable conclusion for him.

1.)-----------------------------preparation------------------------

1.a-- read the file and check the headers, status and import the library we need.

In [1]:
library(tidyverse)
library(pivottabler)
library(plotly)
library(psych)
install.packages("ggplot2::")
options(scipen=999)
─ Attaching packages ──────────────────── tidyverse 1.2.1 ─
✔ ggplot2 3.0.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
─ Conflicts ───────────────────── tidyverse_conflicts() ─
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout


Attaching package: ‘psych’

The following objects are masked from ‘package:ggplot2’:

    %+%, alpha

Warning message:
“package ‘ggplot2::’ is not available (for R version 3.5.1)”
In [2]:
mov <- read.csv("~/project-ionic/R/week4/movies.csv")
head(mov)
names(mov)
summary (mov)
str(mov)
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
8000000 Columbia Pictures Corporation USA Rob Reiner Adventure 52287414 Stand by Me R 1986-08-22 89 8.1 Wil Wheaton 299174 Stephen King 1986
6000000 Paramount Pictures USA John Hughes Comedy 70136369 Ferris Bueller's Day Off PG-13 1986-06-11 103 7.8 Matthew Broderick 264740 John Hughes 1986
15000000 Paramount Pictures USA Tony Scott Action 179800601 Top Gun PG 1986-05-16 110 6.9 Tom Cruise 236909 Jim Cash 1986
18500000 Twentieth Century Fox Film CorporationUSA James Cameron Action 85160248 Aliens R 1986-07-18 137 8.4 Sigourney Weaver 540152 James Cameron 1986
9000000 Walt Disney Pictures USA Randal Kleiser Adventure 18564613 Flight of the Navigator PG 1986-08-01 90 6.9 Joey Cramer 36636 Mark H. Baker 1986
6000000 Hemdale UK Oliver Stone Drama 138530565 Platoon R 1987-02-06 120 8.1 Charlie Sheen 317585 Oliver Stone 1986
  1. 'budget'
  2. 'company'
  3. 'country'
  4. 'director'
  5. 'genre'
  6. 'gross'
  7. 'name'
  8. 'rating'
  9. 'released'
  10. 'runtime'
  11. 'score'
  12. 'star'
  13. 'votes'
  14. 'writer'
  15. 'year'
     budget                                            company    
 Min.   :        0   Universal Pictures                    : 302  
 1st Qu.:        0   Warner Bros.                          : 294  
 Median : 11000000   Paramount Pictures                    : 259  
 Mean   : 24581129   Twentieth Century Fox Film Corporation: 205  
 3rd Qu.: 32000000   New Line Cinema                       : 172  
 Max.   :300000000   Columbia Pictures Corporation         : 166  
                     (Other)                               :5422  
      country                  director          genre          gross          
 USA      :4872   Woody Allen      :  33   Comedy   :2080   Min.   :       70  
 UK       : 698   Clint Eastwood   :  24   Drama    :1444   1st Qu.:  1515839  
 France   : 283   Steven Soderbergh:  21   Action   :1331   Median : 12135679  
 Canada   : 150   Steven Spielberg :  21   Crime    : 522   Mean   : 33497829  
 Germany  : 134   Ron Howard       :  20   Adventure: 392   3rd Qu.: 40065340  
 Australia:  82   Joel Schumacher  :  19   Biography: 359   Max.   :936662225  
 (Other)  : 601   (Other)          :6682   (Other)  : 692                      
                  name            rating           released       runtime     
 Hamlet             :   3   R        :3392   1991-10-04:  10   Min.   : 50.0  
 Pulse              :   3   PG-13    :1995   1988-10-21:   9   1st Qu.: 95.0  
 Anna Karenina      :   2   PG       : 951   1988-11-18:   9   Median :102.0  
 Bad Company        :   2   NOT RATED: 174   2008-09-26:   9   Mean   :106.6  
 Beautiful Creatures:   2   G        : 147   1986-08-22:   8   3rd Qu.:115.0  
 Behind Enemy Lines :   2   UNRATED  :  71   1986-11-07:   8   Max.   :366.0  
 (Other)            :6806   (Other)  :  90   (Other)   :6767                  
     score                      star          votes        
 Min.   :1.500   Nicolas Cage     :  42   Min.   :     27  
 1st Qu.:5.800   Robert De Niro   :  38   1st Qu.:   7665  
 Median :6.400   Denzel Washington:  36   Median :  25892  
 Mean   :6.375   Tom Hanks        :  35   Mean   :  71220  
 3rd Qu.:7.100   Bruce Willis     :  33   3rd Qu.:  75812  
 Max.   :9.300   Johnny Depp      :  32   Max.   :1861666  
                 (Other)          :6604                    
                 writer          year     
 Woody Allen        :  32   Min.   :1986  
 Luc Besson         :  25   1st Qu.:1993  
 Stephen King       :  22   Median :2001  
 John Hughes        :  18   Mean   :2001  
 David Mamet        :  14   3rd Qu.:2009  
 William Shakespeare:  14   Max.   :2016  
 (Other)            :6695                 
'data.frame':	6820 obs. of  15 variables:
 $ budget  : num  8000000 6000000 15000000 18500000 9000000 6000000 25000000 6000000 9000000 15000000 ...
 $ company : Factor w/ 2179 levels "\"DIA\" Productions GmbH & Co. KG",..: 665 1683 1683 2068 2124 1159 1161 763 1683 1936 ...
 $ country : Factor w/ 57 levels "Argentina","Aruba",..: 56 56 56 56 56 54 54 56 56 56 ...
 $ director: Factor w/ 2759 levels "\xc1lex de la Iglesia",..: 2200 1305 2652 1074 2131 1955 1215 594 1011 563 ...
 $ genre   : Factor w/ 17 levels "Action","Adventure",..: 2 5 1 1 2 7 2 7 5 7 ...
 $ gross   : num  52287414 70136369 179800601 85160248 18564613 ...
 $ name    : Factor w/ 6731 levels "'71","'night, Mother",..: 4671 1829 6211 299 1880 3911 2892 784 3969 5313 ...
 $ rating  : Factor w/ 13 levels "B","B15","G",..: 9 8 7 9 7 9 7 9 8 9 ...
 $ released: Factor w/ 2403 levels "1986-01-10","1986-01-17",..: 40 28 24 34 37 76 31 52 10 39 ...
 $ runtime : int  89 103 110 137 90 120 101 120 96 96 ...
 $ score   : num  8.1 7.8 6.9 8.4 6.9 8.1 7.4 7.8 6.8 7.5 ...
 $ star    : Factor w/ 2504 levels "'Weird Al' Yankovic",..: 2458 1609 2350 2197 1146 374 531 928 1734 1045 ...
 $ votes   : int  299174 264740 236909 540152 36636 317585 102879 146768 60565 129698 ...
 $ writer  : Factor w/ 4199 levels "'Weird Al' Yankovic",..: 3728 1981 1862 1639 2559 2996 984 904 1981 1343 ...
 $ year    : int  1986 1986 1986 1986 1986 1986 1986 1986 1986 1986 ...

2.)-------------------Four aspects----------------------------------------

2.a-1 -- Year, Genre and rating Distribution

-- First, check the number of movies per year

In [3]:
year_num <- mov %>% group_by(year) %>% summarize(num = n())

p<- plot_ly(year_num, x = ~year, y = ~num, name = 'Tree 1', type = 'scatter',
       mode = 'lines')
p

- The number of movies after 1990 are the same in data.

2a-2 -- Which genre has higher percentage in data.

-- using the with() to perform the 3D plot.

In [4]:
with(mov, plot_ly(iris, x = genre, y= rating, z = year,
                  size = year, color = genre,
                  type="scatter3d", mode="markers"))
Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message:
“`line.width` does not currently support multiple values.”Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
”Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
”
In [5]:
my_cols <- c("purple", "violetred1", "green3","red", "cyan", "steelblue","white","orange","pink","gray","black","maroon",
                                    "coral","forestgreen","darkblue","lavender","ivory")
pairs(mov[,1:5], pch = 19,  cex = 0.5,
      col = my_cols[mov$genre],
      lower.panel=NULL)
In [6]:
library(RColorBrewer)
barplot(table(mov$runtime,mov$score),col  = brewer.pal(4,"Set1"))
In [7]:
genrePie <- mov %>% group_by(genre) %>% summarize (num = n())
genrePie

data=c(1331,392,277,359,2080,522,1444,14,32,277,4,38,15,13,18,2,2)
pct = (data/sum(data))*100
pct = round(pct,2)
    labels = c("Action", "Adventure", "Animation", 
                                 "Biography", "Comedy","Crime","Drama","Family              ","Fantasy","Horror","Musical   ","Mystery","Romance"
                                 ,"Sci-Fi","Thriller","War","Western")
    labels = paste(labels,pct, "%")
            col = c("purple", "violetred1", "green3","red", "cyan", "steelblue","white","orange","pink","gray","black","maroon",
                                    "coral","forestgreen","darkblue","lavender","ivory")

    pie(pct,col = col, radius = 0.8, init.angle = 180, clockwise = TRUE, 
    labels =labels, main = "Genre Percentage")
genrenum
Action 1331
Adventure 392
Animation 277
Biography 359
Comedy 2080
Crime 522
Drama 1444
Family 14
Fantasy 32
Horror 277
Musical 4
Mystery 38
Romance 15
Sci-Fi 13
Thriller 18
War 2
Western 2
In [8]:
plot_ly(genrePie, x = ~genre, y = ~num, type = 'bar', name = 'Sepal.Width') %>%
 layout(yaxis = list(title = 'Count'), barmode = 'stack')

-- The Distribution above shows the most movies in our data belongs to the rating group : PG-13 and PG and the genre group: Action, Animation and comedy. This is a pie chart below to expain detaily So, we can say the Drama, Crime, Comedy, and Action is the mainstream of movies in the last 30 years.

2.b --Relationship of Score and Budget

In [9]:
head(mov[order(mov$score, decreasing = T), ], 10)
mean(mov$score)
plot( mov$score,mov$budget, type = "point",
     pch=21, bg=c("yellow"), xlab = "Score", ylab ="Budget",
     main="score and budget")
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1761 25000000 Castle Rock Entertainment USA Frank Darabont Crime 28341469 The Shawshank Redemption R 1994-10-14 142 9.3 Tim Robbins 1861666 Stephen King 1994
4841185000000 Warner Bros. USA Christopher Nolan Action 534858444 The Dark Knight PG-13 2008-07-18 152 9.0 Christian Bale 1839571 Jonathan Nolan 2008
1543 22000000 Universal Pictures USA Steven Spielberg Biography 96067179 Schindler's List R 1994-02-04 195 8.9 Liam Neeson 956124 Thomas Keneally 1993
1762 8000000 Miramax USA Quentin Tarantino Crime 107928762 Pulp Fiction R 1994-10-14 154 8.9 John Travolta 1456787 Quentin Tarantino 1994
3744 94000000 New Line Cinema USA Peter Jackson Adventure 377845905 The Lord of the Rings: The Return of the King PG-13 2003-12-17 201 8.9 Elijah Wood 1332020 J.R.R. Tolkien 2003
1763 55000000 Paramount Pictures USA Robert Zemeckis Comedy 330252182 Forrest Gump PG-13 1994-07-06 142 8.8 Tom Hanks 1402876 Winston Groom 1994
2861 63000000 Fox 2000 Pictures USA David Fincher Drama 37030102 Fight Club R 1999-10-15 139 8.8 Brad Pitt 1492073 Chuck Palahniuk 1999
3302 93000000 New Line Cinema New Zealand Peter Jackson Adventure 315544750 The Lord of the Rings: The Fellowship of the RingPG-13 2001-12-19 178 8.8 Elijah Wood 1352483 J.R.R. Tolkien 2001
5281160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
882 25000000 Warner Bros. USA Martin Scorsese Crime 46836394 Goodfellas R 1990-09-21 146 8.7 Robert De Niro 802599 Nicholas Pileggi 1990
6.37489736070381
Warning message in plot.xy(xy, type, ...):
“繪圖類型 'point' 被截短成第一個字元”

2.c --Relationship of Score and Votes

In [10]:
head(mov[order(mov$vote, decreasing = T), ], 10)
mean(mov$votes)
qplot(score, votes, data = mov, geom = c("point", "line"), color = I("red"))
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1761 25000000 Castle Rock Entertainment USA Frank Darabont Crime 28341469 The Shawshank Redemption R 1994-10-14 142 9.3 Tim Robbins 1861666 Stephen King 1994
4841185000000 Warner Bros. USA Christopher Nolan Action 534858444 The Dark Knight PG-13 2008-07-18 152 9.0 Christian Bale 1839571 Jonathan Nolan 2008
5281160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
2861 63000000 Fox 2000 Pictures USA David Fincher Drama 37030102 Fight Club R 1999-10-15 139 8.8 Brad Pitt 1492073 Chuck Palahniuk 1999
1762 8000000 Miramax USA Quentin Tarantino Crime 107928762 Pulp Fiction R 1994-10-14 154 8.9 John Travolta 1456787 Quentin Tarantino 1994
1763 55000000 Paramount Pictures USA Robert Zemeckis Comedy 330252182 Forrest Gump PG-13 1994-07-06 142 8.8 Tom Hanks 1402876 Winston Groom 1994
3302 93000000 New Line Cinema New Zealand Peter Jackson Adventure 315544750 The Lord of the Rings: The Fellowship of the RingPG-13 2001-12-19 178 8.8 Elijah Wood 1352483 J.R.R. Tolkien 2001
2863 63000000 Warner Bros. USA Lana Wachowski Action 171479930 The Matrix R 1999-03-31 136 8.7 Keanu Reeves 1339820 Lilly Wachowski 1999
3744 94000000 New Line Cinema USA Peter Jackson Adventure 377845905 The Lord of the Rings: The Return of the King PG-13 2003-12-17 201 8.9 Elijah Wood 1332020 J.R.R. Tolkien 2003
5721250000000 Warner Bros. UK Christopher Nolan Action 448139099 The Dark Knight Rises PG-13 2012-07-20 164 8.4 Christian Bale 1253772 Jonathan Nolan 2012
71219.5225806452

- According to this chart whose line move forward gradually, we can know the vote from people will decide your score of movie really.

2.d --Relationship of Score and Gross

In [11]:
head(mov[order(mov$gross, decreasing = T), ], 10)
mean(mov$gross)
qplot(score, gross, data = mov, geom = c("line"), color = I("blue"))
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
6381245000000 Lucasfilm USA J.J. Abrams Action 936662225 Star Wars: The Force Awakens PG-13 2015-12-18 136 8.1 Daisy Ridley 687192 Lawrence Kasdan 2015
5062237000000 Twentieth Century Fox Film Corporation UK James Cameron Action 760507625 Avatar PG-13 2009-12-18 162 7.8 Sam Worthington 954412 James Cameron 2009
2421200000000 Twentieth Century Fox Film Corporation USA James Cameron Drama 658672302 Titanic PG-13 1997-12-19 194 7.8 Leonardo DiCaprio 862554 James Cameron 1997
6392150000000 Universal Pictures USA Colin Trevorrow Action 652270625 Jurassic World PG-13 2015-06-12 124 7.0 Chris Pratt 469200 Rick Jaffa 2015
5724220000000 Marvel Studios USA Joss Whedon Action 623357910 The Avengers PG-13 2012-05-04 143 8.1 Robert Downey Jr. 1064633 Joss Whedon 2012
4841185000000 Warner Bros. USA Christopher Nolan Action 534858444 The Dark Knight PG-13 2008-07-18 152 9.0 Christian Bale 1839571 Jonathan Nolan 2008
6615200000000 Lucasfilm USA Gareth Edwards Action 532177324 Rogue One PG-13 2016-12-16 133 7.9 Felicity Jones 365473 Chris Weitz 2016
6688200000000 Pixar Animation Studios USA Andrew Stanton Animation 486295561 Finding Dory PG 2016-06-17 97 7.4 Ellen DeGeneres 173005 Andrew Stanton 2016
2871115000000 Lucasfilm USA George Lucas Action 474544677 Star Wars: Episode I - The Phantom MenacePG 1999-05-19 136 6.5 Ewan McGregor 584809 George Lucas 1999
6399250000000 Marvel Studios USA Joss Whedon Action 459005868 Avengers: Age of Ultron PG-13 2015-05-01 141 7.4 Robert Downey Jr. 537832 Joss Whedon 2015
33497828.6155425

3.) --------------------Steven Spielberg's movies------------------------

-This will let us understand our director. Make a subset of Steven Spielberg

In [12]:
Ste <- mov %>% filter(mov$director == "Steven Spielberg") 

head(Ste[order(Ste$score, decreasing = T), ], 10)

pairs.panels(Ste[,-5], 
             method = "pearson", # correlation method
             hist.col = "#00AFBB",
             density = TRUE,  # show density plots
             ellipses = TRUE # show correlation ellipses
)
Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 14 不適用於此語言環境”ERROR while rich displaying an object: Error in gsub(" &\\", "\\", r, fixed = TRUE): input string 1 is invalid in this locale

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. mime2repr[[mime]](obj)
8. repr_latex.data.frame(obj)
9. gsub(" &\\", "\\", r, fixed = TRUE)
Warning message in grepl("<html.*>", data[["text/html"]], ignore.case = TRUE):
“輸入的字串 1 不適用於此語言環境”
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
5 22000000 Universal Pictures USA Steven Spielberg Biography 96067179 Schindler's List R 1994-02-04 195 8.9 Liam Neeson 956124 Thomas Keneally 1993
9 70000000 DreamWorks USA Steven Spielberg Drama 216540909 Saving Private Ryan R 1998-07-24 169 8.6 Tom Hanks 979007 Robert Rodat 1998
2 48000000 Paramount Pictures USA Steven Spielberg Action 197171806 Indiana Jones and the Last Crusade PG-13 1989-05-24 127 8.3 Harrison Ford 566432 Jeffrey Boam 1989
6 63000000 Universal Pictures USA Steven Spielberg Adventure 402453882 Jurassic Park PG-13 1993-06-11 127 8.1 Sam Neill 685270 Michael Crichton 1993
11 52000000 DreamWorks USA Steven Spielberg Biography 164615351 Catch Me If You Can PG-13 2002-12-25 141 8.1 Leonardo DiCaprio 611179 Jeff Nathanson 2002
1 35000000 Amblin Entertainment USA Steven Spielberg Drama 22238696 Empire of the Sun PG 1987-12-25 153 7.8 Christian Bale 96566 Tom Stoppard 1987
12102000000 Twentieth Century Fox Film CorporationUSA Steven Spielberg Action 132072926 Minority Report PG-13 2002-06-21 145 7.7 Tom Cruise 430579 Philip K. Dick 2002
15 70000000 DreamWorks France Steven Spielberg Crime 47403685 Munich R 2006-01-06 164 7.6 Eric Bana 186355 Tony Kushner 2005
20 40000000 DreamWorks USA Steven Spielberg Drama 72313754 Bridge of Spies PG-13 2015-10-16 142 7.6 Tom Hanks 227862 Matt Charman 2015
17135000000 Columbia Pictures USA Steven Spielberg Animation 77591831 The Adventures of Tintin PG 2011-12-21 107 7.4 Jamie Bell 188070 Herg 2011
Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in min(diff(breaks)):
“min 中沒有無漏失的引數; 回傳 Inf”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”Warning message in cor(x, y, use = "pairwise", method = method):
“the standard deviation is zero”

3.a -- The total number of genre and score.

In [13]:
qhpvt(Ste, "score", "genre", "n()")
qplot(genre,score, data = Ste,
geom = c("point", "line"))

## 3.b -- The total number of company and genre.

In [14]:
qhpvt(Ste, "company", "genre", "n()")

DreamW <- mov %>% filter (company == "DreamWorks" )

mean(DreamW$score)
    ggplot(DreamW, aes(x=year,y=score,color=genre)) + geom_point(alpha=0.5) + coord_fixed() + labs(title="DreamWorks",
        x ="Year", y = "Score")


plot_ly(data = DreamW, x = ~year, y = ~score, color = ~genre)


universalP <- mov %>% filter (company == "Universal Pictures" )
     ggplot(universalP, aes(x=year,y=score,color=genre)) + geom_point(alpha=0.5) + coord_fixed()+ labs(title="Universal Pictures",
        x ="Year", y = "Score")


plot_ly(data = universalP, x = ~year, y = ~score, color = ~genre)
6.65921052631579
No trace type specified:
  Based on info supplied, a 'scatter' trace seems appropriate.
  Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
  Setting the mode to markers
  Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
”Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
”
No trace type specified:
  Based on info supplied, a 'scatter' trace seems appropriate.
  Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
  Setting the mode to markers
  Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
”Warning message in RColorBrewer::brewer.pal(N, "Set2"):
“n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
”

- we also check the score of historic movies in the big companies DreamWorks and Universal Picture. the average of two company almost same.

3.c -- The total number of genre and country.

In [15]:
qhpvt(Ste, "country", "genre", "n()")

3.d -- The average of runtime.

In [16]:
qhpvt(mov, "runtime", "score", "n()")

ggplot (Ste, aes (x = score, y = runtime, colour = score)) + stat_density2d ()

- In this section, we understand Mr. Spielberg have done 21 movies (20 in USA, 1 in France) in last 30 years. He has nine movies which are over 7.5 score. And he likes to work with DreamWorks and Universal Pictures. In addition, the average of runtime of his high score movies is around 150 mins. Two big companies cooperated with Mr. Spielberg are good choise to cooperate again.

4.) ------------------- Segmentation---------------------------------

- We will use the result we got from above to analyze into segmentations.

4.a - First, we anayze the high score movies in last 30 year

In [17]:
head(mov[order(mov$score, decreasing = T),], 10)
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1761 25000000 Castle Rock Entertainment USA Frank Darabont Crime 28341469 The Shawshank Redemption R 1994-10-14 142 9.3 Tim Robbins 1861666 Stephen King 1994
4841185000000 Warner Bros. USA Christopher Nolan Action 534858444 The Dark Knight PG-13 2008-07-18 152 9.0 Christian Bale 1839571 Jonathan Nolan 2008
1543 22000000 Universal Pictures USA Steven Spielberg Biography 96067179 Schindler's List R 1994-02-04 195 8.9 Liam Neeson 956124 Thomas Keneally 1993
1762 8000000 Miramax USA Quentin Tarantino Crime 107928762 Pulp Fiction R 1994-10-14 154 8.9 John Travolta 1456787 Quentin Tarantino 1994
3744 94000000 New Line Cinema USA Peter Jackson Adventure 377845905 The Lord of the Rings: The Return of the King PG-13 2003-12-17 201 8.9 Elijah Wood 1332020 J.R.R. Tolkien 2003
1763 55000000 Paramount Pictures USA Robert Zemeckis Comedy 330252182 Forrest Gump PG-13 1994-07-06 142 8.8 Tom Hanks 1402876 Winston Groom 1994
2861 63000000 Fox 2000 Pictures USA David Fincher Drama 37030102 Fight Club R 1999-10-15 139 8.8 Brad Pitt 1492073 Chuck Palahniuk 1999
3302 93000000 New Line Cinema New Zealand Peter Jackson Adventure 315544750 The Lord of the Rings: The Fellowship of the RingPG-13 2001-12-19 178 8.8 Elijah Wood 1352483 J.R.R. Tolkien 2001
5281160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
882 25000000 Warner Bros. USA Martin Scorsese Crime 46836394 Goodfellas R 1990-09-21 146 8.7 Robert De Niro 802599 Nicholas Pileggi 1990

4.b - There is a lot of movies are too old. on the top, the time of movie is 1994, so I decide to analyze the data after 2010.

In [18]:
YMov <- filter(mov, year >= "2010" )
head(YMov[order(YMov$score, decreasing = T),], 10)
Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”Warning message in FUN(X[[i]], ...):
“輸入的字串 12 不適用於此語言環境”ERROR while rich displaying an object: Error in gsub(" &\\", "\\", r, fixed = TRUE): input string 1 is invalid in this locale

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. mime2repr[[mime]](obj)
8. repr_latex.data.frame(obj)
9. gsub(" &\\", "\\", r, fixed = TRUE)
Warning message in grepl("<html.*>", data[["text/html"]], ignore.case = TRUE):
“輸入的字串 1 不適用於此語言環境”
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
1434 0 Aamir Khan Productions India Nitesh Tiwari Action 12391761 Dangal UNRATED 2016-12-21 161 8.7 Aamir Khan 70565 Piyush Gupta 2016
232 0 Quad Productions France Olivier Nakache Biography 13182281 The Intouchables R 2012-08-24 112 8.6 Franois Cluzet 579143 Olivier Nakache 2011
882165000000 Paramount Pictures USA Christopher Nolan Adventure 188020017 Interstellar PG-13 2014-11-07 169 8.6 Matthew McConaughey 1095553 Jonathan Nolan 2014
888 3300000 Bold Films USA Damien Chazelle Drama 13092000 Whiplash R 2015-01-23 107 8.5 Miles Teller 503754 Damien Chazelle 2014
1342 0 Amuse Japan Makoto Shinkai Animation 5017246 Your name PG 2017-04-07 106 8.5 Rynosuke Kamiki 54503 Makoto Shinkai 2016
281 500000 Asghar Farhadi ProductionsIran Asghar Farhadi Drama 7098492 A Separation PG-13 2012-04-20 123 8.4 Payman Maadi 170427 Asghar Farhadi 2011
441250000000 Warner Bros. UK Christopher Nolan Action 448139099 The Dark Knight Rises PG-13 2012-07-20 164 8.4 Christian Bale 1253772 Jonathan Nolan 2012
442100000000 The Weinstein Company USA Quentin Tarantino Drama 162805434 Django Unchained R 2012-12-25 165 8.4 Jamie Foxx 1070691 Quentin Tarantino 2012
1251 9400000 Panorama Studios India Nishikant Kamat Crime 739478 Drishyam NOT RATED 2015-07-31 163 8.4 Ajay Devgn 42547 Jeethu Joseph 2015

4.c - According the rank of score, the movies on the rank are from Warner Bros, Aamir Khan Productions, Quad Productions, and Amuse comanies. Because the Mr. Spielberg will produce a new movie. I recommand to anayze the data in two conunties which are have more movies data in here: USA and UK

In [19]:
YCMov <- filter(mov, year >= "2010" & (country =="USA" | country =="UK" ) )
summary(YCMov)
str(YCMov)
head(YCMov[order(YCMov$score, decreasing = T),], 10)
     budget                                            company   
 Min.   :        0   Columbia Pictures                     : 56  
 1st Qu.:  5000000   Universal Pictures                    : 52  
 Median : 20000000   Warner Bros.                          : 48  
 Mean   : 40860228   Paramount Pictures                    : 42  
 3rd Qu.: 50000000   Twentieth Century Fox Film Corporation: 41  
 Max.   :260000000   Summit Entertainment                  : 27  
                     (Other)                               :967  
      country                 director          genre         gross          
 USA      :1049   Woody Allen     :   6   Comedy   :325   Min.   :      441  
 UK       : 184   Clint Eastwood  :   5   Action   :290   1st Qu.:  2461121  
 Argentina:   0   Nicholas Stoller:   5   Drama    :231   Median : 26049082  
 Aruba    :   0   Ridley Scott    :   5   Biography: 92   Mean   : 53943348  
 Australia:   0   Shawn Levy      :   5   Adventure: 80   3rd Qu.: 67061228  
 Austria  :   0   Steven Spielberg:   5   Animation: 69   Max.   :936662225  
 (Other)  :   0   (Other)         :1202   (Other)  :146                      
                  name            rating          released       runtime     
 Concussion         :   2   R        :592   2013-08-23:   7   Min.   : 76.0  
 Frozen             :   2   PG-13    :455   2015-01-23:   7   1st Qu.: 97.0  
 '71                :   1   PG       :141   2015-09-25:   7   Median :104.0  
 10 Cloverfield Lane:   1   NOT RATED: 26   2016-11-18:   7   Mean   :107.2  
 10 Years           :   1   G        : 10   2010-12-17:   6   3rd Qu.:116.0  
 12 Years a Slave   :   1   UNRATED  :  6   2011-10-14:   6   Max.   :180.0  
 (Other)            :1225   (Other)  :  3   (Other)   :1193                  
     score                        star          votes        
 Min.   :1.500   Matt Damon         :  10   Min.   :    980  
 1st Qu.:5.900   Matthew McConaughey:  10   1st Qu.:  27014  
 Median :6.500   Nicolas Cage       :  10   Median :  62768  
 Mean   :6.429   Steve Carell       :  10   Mean   : 115273  
 3rd Qu.:7.000   Adam Sandler       :   9   3rd Qu.: 137119  
 Max.   :8.800   Ben Affleck        :   9   Max.   :1629342  
                 (Other)            :1175                    
                writer          year     
 Allan Loeb        :   6   Min.   :2010  
 Steven Knight     :   6   1st Qu.:2011  
 Woody Allen       :   6   Median :2013  
 Christopher Markus:   5   Mean   :2013  
 Dan Fogelman      :   5   3rd Qu.:2015  
 John Logan        :   5   Max.   :2016  
 (Other)           :1200                 
'data.frame':	1233 obs. of  15 variables:
 $ budget  : num  160000000 13000000 60000000 80000000 0 30000000 20000000 125000000 1500000 40000000 ...
 $ company : Factor w/ 2179 levels "\"DIA\" Productions GmbH & Co. KG",..: 2125 1014 2089 1683 2125 1481 825 2125 111 664 ...
 $ country : Factor w/ 57 levels "Argentina","Aruba",..: 56 56 56 56 54 54 56 56 56 56 ...
 $ director: Factor w/ 2759 levels "\xc1lex de la Iglesia",..: 448 543 718 1711 630 1741 1212 1598 1105 571 ...
 $ genre   : Factor w/ 17 levels "Action","Adventure",..: 1 7 1 12 2 1 5 1 10 4 ...
 $ gross   : num  292576195 106954678 31494270 128012934 295983305 ...
 $ name    : Factor w/ 6731 levels "'71","'night, Mother",..: 2546 737 4345 4478 2221 2765 4445 1098 2565 5935 ...
 $ rating  : Factor w/ 13 levels "B","B15","G",..: 8 9 8 9 8 9 9 8 8 8 ...
 $ released: Factor w/ 2403 levels "1986-01-10","1986-01-17",..: 1849 1883 1855 1812 1876 1824 1815 1822 1905 1865 ...
 $ runtime : int  148 108 112 138 146 117 100 106 103 120 ...
 $ score   : num  8.8 8 7.5 8.1 7.7 7.7 6.4 5.8 6.8 7.7 ...
 $ star    : Factor w/ 2504 levels "'Weird Al' Yankovic",..: 1448 1756 1658 1448 502 13 1029 2089 1853 1099 ...
 $ votes   : int  1629342 594148 302029 883895 368576 465530 107464 241875 226632 521956 ...
 $ writer  : Factor w/ 4199 levels "'Weird Al' Yankovic",..: 667 2565 2701 2327 3764 1700 3603 4024 2378 19 ...
 $ year    : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
721165000000 Paramount Pictures USA Christopher Nolan Adventure 188020017 Interstellar PG-13 2014-11-07 169 8.6 Matthew McConaughey 1095553 Jonathan Nolan 2014
727 3300000 Bold Films USA Damien Chazelle Drama 13092000 Whiplash R 2015-01-23 107 8.5 Miles Teller 503754 Damien Chazelle 2014
359250000000 Warner Bros. UK Christopher Nolan Action 448139099 The Dark Knight Rises PG-13 2012-07-20 164 8.4 Christian Bale 1253772 Jonathan Nolan 2012
360100000000 The Weinstein Company USA Quentin Tarantino Drama 162805434 Django Unchained R 2012-12-25 165 8.4 Jamie Foxx 1070691 Quentin Tarantino 2012
29200000000 Walt Disney Pictures USA Lee Unkrich Animation 415004880 Toy Story 3 G 2010-06-18 103 8.3 Tom Hanks 603472 John Lasseter 2010
188 25000000 Lionsgate USA Gavin O'Connor Drama 13657115 Warrior PG-13 2011-09-09 140 8.2 Tom Hardy 365224 Gavin O'Connor 2011
541100000000 Red Granite Pictures USA Martin Scorsese Biography 116900694 The Wolf of Wall StreetR 2013-12-25 180 8.2 Leonardo DiCaprio 895552 Terence Winter 2013
917175000000 Pixar Animation StudiosUSA Pete Docter Animation 356461711 Inside Out PG 2015-06-19 95 8.2 Amy Poehler 439792 Pete Docter 2015
1066 30000000 Summit Entertainment USA Damien Chazelle Comedy 151101803 La La Land PG-13 2016-12-25 128 8.2 Ryan Gosling 308891 Damien Chazelle 2016

4d. - we find the data which is after 2010 and which in USA or UK. then we know the average of score is 6.4, but, we want to find the better data, we will filter out the movies whose score are lower than 8.

In [20]:
YCSMov <- filter(mov, year >= "2010" & (country =="USA" | country =="UK" ) & score >= "7")
head(YCSMov[order(YCSMov$score, decreasing = T),], 10)
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
200165000000 Paramount Pictures USA Christopher Nolan Adventure 188020017 Interstellar PG-13 2014-11-07 169 8.6 Matthew McConaughey 1095553 Jonathan Nolan 2014
206 3300000 Bold Films USA Damien Chazelle Drama 13092000 Whiplash R 2015-01-23 107 8.5 Miles Teller 503754 Damien Chazelle 2014
90250000000 Warner Bros. UK Christopher Nolan Action 448139099 The Dark Knight Rises PG-13 2012-07-20 164 8.4 Christian Bale 1253772 Jonathan Nolan 2012
91100000000 The Weinstein Company USA Quentin Tarantino Drama 162805434 Django Unchained R 2012-12-25 165 8.4 Jamie Foxx 1070691 Quentin Tarantino 2012
16200000000 Walt Disney Pictures USA Lee Unkrich Animation 415004880 Toy Story 3 G 2010-06-18 103 8.3 Tom Hanks 603472 John Lasseter 2010
49 25000000 Lionsgate USA Gavin O'Connor Drama 13657115 Warrior PG-13 2011-09-09 140 8.2 Tom Hardy 365224 Gavin O'Connor 2011
144100000000 Red Granite Pictures USA Martin Scorsese Biography 116900694 The Wolf of Wall StreetR 2013-12-25 180 8.2 Leonardo DiCaprio 895552 Terence Winter 2013
266175000000 Pixar Animation StudiosUSA Pete Docter Animation 356461711 Inside Out PG 2015-06-19 95 8.2 Amy Poehler 439792 Pete Docter 2015
302 30000000 Summit Entertainment USA Damien Chazelle Comedy 151101803 La La Land PG-13 2016-12-25 128 8.2 Ryan Gosling 308891 Damien Chazelle 2016

4e. - Then, base on this data, we want to find the gross is over, Budget because the if the gross is not enough to pay the budget, it means there is no profit from movie that is not a good movies.

In [21]:
YCSGMov <- filter(mov, year >= "2010" & (country =="USA" | country =="UK" ) & score >= "8" & gross >= budget)
head(YCSGMov[order(YCSGMov$score, decreasing = T),], 10)
qhpvt(YCSGMov, "company", "score", "n()")
budgetcompanycountrydirectorgenregrossnameratingreleasedruntimescorestarvoteswriteryear
1160000000 Warner Bros. USA Christopher Nolan Action 292576195 Inception PG-13 2010-07-16 148 8.8 Leonardo DiCaprio 1629342 Christopher Nolan 2010
19165000000 Paramount Pictures USA Christopher Nolan Adventure 188020017 Interstellar PG-13 2014-11-07 169 8.6 Matthew McConaughey 1095553 Jonathan Nolan 2014
23 3300000 Bold Films USA Damien Chazelle Drama 13092000 Whiplash R 2015-01-23 107 8.5 Miles Teller 503754 Damien Chazelle 2014
9250000000 Warner Bros. UK Christopher Nolan Action 448139099 The Dark Knight Rises PG-13 2012-07-20 164 8.4 Christian Bale 1253772 Jonathan Nolan 2012
10100000000 The Weinstein Company USA Quentin Tarantino Drama 162805434 Django Unchained R 2012-12-25 165 8.4 Jamie Foxx 1070691 Quentin Tarantino 2012
5200000000 Walt Disney Pictures USA Lee Unkrich Animation 415004880 Toy Story 3 G 2010-06-18 103 8.3 Tom Hanks 603472 John Lasseter 2010
13100000000 Red Granite Pictures USA Martin Scorsese Biography 116900694 The Wolf of Wall StreetR 2013-12-25 180 8.2 Leonardo DiCaprio 895552 Terence Winter 2013
28175000000 Pixar Animation StudiosUSA Pete Docter Animation 356461711 Inside Out PG 2015-06-19 95 8.2 Amy Poehler 439792 Pete Docter 2015
30 30000000 Summit Entertainment USA Damien Chazelle Comedy 151101803 La La Land PG-13 2016-12-25 128 8.2 Ryan Gosling 308891 Damien Chazelle 2016
3 80000000 Paramount Pictures USA Martin Scorsese Mystery 128012934 Shutter Island R 2010-02-19 138 8.1 Leonardo DiCaprio 883895 Laeta Kalogridis 2010

5.) ----------Best choose for Steven Spielberg-------------

5a. -- Best Movie Production Company

--After we filter out the movies whose score are under 8, we can find which company are good recently.

In [22]:
install.packages("ggrepel")
library("ggrepel")
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
In [23]:
YCSGCMov <- YCSGMov %>%group_by(company) %>% summarize ("total" = n () )
ggplot(YCSGCMov) + geom_bar(stat = "identity", color = 'steelblue', aes(x = company, y = total)) + theme(axis.text.x=element_text(size=rel(1), angle=90))

- Twentieth Century Fox Corporation have four over 8 score movies during 2010 ~ 2016. Wamer Bros also have three.

5b. - Best Writer

- A good movie are along with a good writer. We find Craig Borten and Jonathan Nolan are the good writers in USA and UK

In [24]:
YCSGWMov <- YCSGMov %>%group_by(writer) %>% summarize ("total" = n () )
ggplot(YCSGWMov, aes(x = writer, y = total, group = 1)) + geom_point(colour="steelblue") + geom_line() +theme(axis.text.x=element_text(size=rel(1), angle=90))

5c. Best Star

In [25]:
YCSGSMov <- YCSGMov %>%group_by(star) %>% summarize ("total" = n () )
qhpvt(YCSGMov, "star", "score", "n()")

ggplot(YCSGSMov, aes(x = total, y = star, group = 1)) + geom_step(colour="orange") + geom_count(colour="pink") +theme(axis.text.x=element_text(size=rel(1), angle=90))
In [26]:
select_Leona = subset(mov, star == "Leonardo DiCaprio", select = c(name, score, star, year))
head(select_Leona[order(select_Leona$score, decreasing = T), ], 10)
qhpvt(select_Leona, "year", "score", "n()")
namescorestaryear
5281Inception 8.8 Leonardo DiCaprio 2010
4402The Departed 8.5 Leonardo DiCaprio 2006
5941The Wolf of Wall Street8.2 Leonardo DiCaprio 2013
3524Catch Me If You Can 8.1 Leonardo DiCaprio 2002
5284Shutter Island 8.1 Leonardo DiCaprio 2010
4425Diamante de sangre 8.0 Leonardo DiCaprio 2006
6388Revenant: El Renacido 8.0 Leonardo DiCaprio 2015
2421Titanic 7.8 Leonardo DiCaprio 1997
3521Gangs of New York 7.5 Leonardo DiCaprio 2002
3992El aviador 7.5 Leonardo DiCaprio 2004
In [27]:
select_Matthew = subset(mov, star == "Matthew McConaughey", select = c(name, score, star, year))
head(select_Matthew[order(select_Matthew$score, decreasing = T), ], 10)
qhpvt(select_Matthew, "year", "score", "n()")
namescorestaryear
6162Interstellar 8.6 Matthew McConaughey2014
5963Dallas Buyers Club 8.0 Matthew McConaughey2013
2220A Time to Kill 7.4 Matthew McConaughey1996
5785Mud 7.4 Matthew McConaughey2012
5529The Lincoln Lawyer 7.3 Matthew McConaughey2011
4463We Are Marshall 7.1 Matthew McConaughey2006
6605Sing 7.1 Matthew McConaughey2016
6757Free State of Jones6.9 Matthew McConaughey2016
5564Killer Joe 6.7 Matthew McConaughey2011
6645Gold 6.7 Matthew McConaughey2016
In [28]:
plot_ly(alpha = 0.6) %>%
 add_histogram(x = ~select_Matthew$score, name ="Matthew") %>%
 add_histogram(x = ~select_Leona$score, name =  "Leona") %>%
 layout(barmode = "overlay")

6.)-----------------------Summary-------------------------

According the report, we know the mainstream of types are action, Drama and comedy. And the factories will affect the score are vote from people, gross.

Steven Spielberg in the last 30 years, he has 21 movies: one in France and the others was in France. According his data, he has nine successful movies which score are over 7.5. Especially in Biography and Drama genre, he has more high average of success.

The movie : “Indiana Jones and the last Crusade”, “Jurassic park” and “Minority Report” are successful in action genre. In addition, we know Steven Spielberg’s favourite movie company are Dream works and universal picture.

There seems not too special connection between runtime and score. But we have average time 150 is the runtime for general movie.We also analyze the best writer and star and company for Steve Spielberg. We will talk this in recommendation.

7.)----------------------Recommandation-----------------------

According the data, we will recommend stars Leonardo DiCaprio and Matthew McConaughey to be the first choice because this two star in last 6 year in USA that can attract the audience to go theaters to watch the movies. And the company to cooperate is Dream worker because this company did a good job with you before, they may know more about what you want. The runtime of movies is around 150mins because more runtime mean we need to spend more budget on producing movies. Thank you.